sd: sync to master-591-331cfa5#2155
sd: sync to master-591-331cfa5#2155LostRuins merged 3 commits intoLostRuins:concedo_experimentalfrom
Conversation
3a84f93 to
74cd645
Compare
|
Updated to We got: support for High-Res Fix (which shouldn't change anything, until we add API and UI support on our end), structured metadata on generated images (doesn't matter for us, although we could consider a similar approach), a memory-handling fix for for image to image with tiling, and the big one: removal of all build-time backend dependencies on sd.cpp sources. The backend change is very incompatible with many ggml includes: I've had to remove On the other hand, since sd.cpp became backend-agnostic, |
|
oof, the backend change does not look very pleasant, even though I can see why you added it. How exactly is backend determination supposed to be done now? because currently when selecting a desired backend (via compile flags for that target), the selected backend is expected to be used for everything (e.g. If i pick CPU, even though I might have cuda or vulkan, I expect to use CPU for TTS, LLM and image gen). I do not want automatic backend determination from C++ side, kcpp backend selection should always be explicit (which means it must use pure CPU, or pure vulkan with a specified device if the user chooses it, even though I launched the cuda build with CUDA available and supported). I think since ggml is built with the expected flags this should be doable but I'm very unclear on how this auto works right now. Anyway to make this easier can we do a merge before master-593 first while we figure out a good solution? |
|
(I'll reply about the backend issue later)
Sure, I'll do another merge with |
74cd645 to
d4dba49
Compare
|
Done.
We first pick the device name with this: __STATIC_INLINE__ std::string get_default_backend_name() {
ggml_backend_load_all_once();
// should pick the same backend as ggml_backend_init_best
ggml_backend_dev_t dev = ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_GPU);
dev = dev ? dev : ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_IGPU);
dev = dev ? dev : ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_CPU);
if (dev == nullptr) {
return "";
}
return ggml_backend_dev_name(dev);
}then initialize it by name.
It still works like this: only two types of backend will be exposed by ggml at a time (Vulkan and CPU, ROCm and CPU, etc), so it'll prefer the first discrete GPU, then the first iGPU; and use it for everything (the flags --x-on-cpu still work the same way as before).
In the sense of, say, Vulkan0 versus Vulkan1, I think the selection has changed: I have a lot of scripts to convince everybody that my dGPU is at a specific slot (it often switches places with the iGPU at boot 😕), and sd.cpp now seems to correctly pick the dGPU regardless of them. But note: even if the previous algorithm was just "pick the first device", technically it was already the C++ layer choosing it (unless forced with We will likely get more sophisticated selection soon (like "Vulkan1 for the diffusion, Vulkan0 for the VAE, CPU for the conditioner"), but we are not there yet: it's still the same backend for everything, except for --x-on-cpu. I made a hackish adaptation for
This way, if the Python layer decides it wants to force a device, let C++ pick whatever, or even support fancy aliases like "iGPU", it can. In fact, we could even add support for placing models on different devices before upstream: the diff would be pretty simple at this point. |
|
I cannot remember which model I originally tested that had mis-named tensors. Tested an extended set of my usual model collections on windows, all seem to work: btw any idea why leejet keeps doing these massive refactors? kinda sad we get this instead of new model support lol |
Nice. Maybe we could send a PR upstream with the
Some results make sense: e.g. the modularization on this one helped with legacy .pth support, and likely with the gguf -> safetensors conversion. I just wish they weren't so disruptive... |
As I've mentioned back on #2150 , we had a big refactor on the model I/O code.
It touched several
sd_get_u8pathcalls. I've moved them to a wrapper around the top-levelModelLoader::init_from_file; I think it even covers more code paths now. But I have no way to test it on Windows (note the call insideModelLoader::load_tensorsisn't needed because it uses a previously-stored path).The safetensors reading got changed, and do not apply the tensor prefix together with the file reading anymore. I've moved the
kcpp_fix_wrong_img_tensor_namecall to what looks like the right place, but that needs to be tested with an affected .safetensors file;The main issue is that the source includes in
sdtype_adapter.cpponly worked after pretty heavy gymnastics (internal function name collisions). We need to seriously consider a more "normal" approach for building: .cpp -> .o -> link, an intermediate static library, or just passing the sources directly on the command line.There is also a scheduler change that went missing from the last sync.